{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Using [FindACode](https://www.findacode.com/tools/map-a-code/cpt-hcpcs-ccs.php)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There's different ways of mapping the procedure codes to CCS\n",
    " - one is to use the coded min max distance within a range to allocate a token to a code as done in the python script\n",
    " - the other is to use the findacode mapping system and manually entering the codes and obtaining the tokens\n",
    "     - for this you need to create an account and enter all the possible codes that can happen in the dataset\n",
    "     - then download all the corresponding output to create a procedure CCS map\n",
    "     \n",
    " \n",
    "This is different from the diagnosis code mapping."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## MIMIC III example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import re\n",
    "import csv\n",
    "import pandas as pd\n",
    "import os\n",
    "import pickle\n",
    "## PATH TO CPTEVENTS FILE IN MIMIC III\n",
    "cpt_events_path = './data/CPTEVENTS.csv'\n",
    "df_cptevents = pd.read_csv(cpt_events_path)\n",
    "\n",
    "\n",
    "# extract all CPT codes\n",
    "def filt(x):\n",
    "    reg = r'[a-zA-Z]+'\n",
    "    if (len(re.findall(reg, str(x)))):\n",
    "        return str(x)\n",
    "    else:\n",
    "        return str(x)\n",
    "        \n",
    "all_cpts = list(set(map(filt, list(df_cptevents['CPT_CD']))))\n",
    "\n",
    "# write them to text files in chunks of 500 as there is a \n",
    "# code limit entry for the free version in findacode\n",
    "for i in range(len(all_cpts) // 500 + 1):\n",
    "    with open('all_cpts{}.txt'.format(i), 'w') as f:\n",
    "        for item in all_cpts[i*500:500*(i+1)]:\n",
    "            f.write(\"{}\\n\".format(item))\n",
    "    f.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Next enter all the text files into findacode and download the corresponding \n",
    "### cpt-hcpcs-ccs.csv files\n",
    "\n",
    "I have done this for MIMIC III although not sure if sharing the text files is legal.\n",
    "\n",
    "\n",
    "**TODO**: look into this and release into repo...."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "path = './'\n",
    "files = os.listdir(path)\n",
    "\n",
    "# for example if you had 5 text files in the previous step\n",
    "# there should be 5 cpt-hcpcs-ccs.csv files\n",
    "# below code simply concatenates them\n",
    "files = [f for f in files if 'cpt-hcpcs-ccs' in f]\n",
    "df_map = pd.DataFrame()\n",
    "for f in files:\n",
    "    tdf = pd.read_csv(os.path.join(path, f))\n",
    "    df_map = df_map.append(tdf, ignore_index=True)\n",
    "df_map = df_map.rename(columns={'css': 'ccs'}) # named incorrectly from dump on the findacode website\n",
    "                                               # last time i ran this was in 2019, they might have fixed this\n",
    "                                               # check the output csv files.\n",
    "df_map = df_map.dropna()\n",
    "\n",
    "code_ccs_map = dict(zip(df_map.code, df_map.ccs))\n",
    "with open('./code_ccs_map.pkl', 'wb') as handle:\n",
    "    pickle.dump(code_ccs_map, handle, protocol=pickle.HIGHEST_PROTOCOL)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python (tf)",
   "language": "python",
   "name": "myenv"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
